Skip to content

[action] [PR:16689] Fix system-health hardware_checker to consume fan tolerance details (#16689)#17983

Merged
mssonicbld merged 1 commit intosonic-net:202311from
mssonicbld:cherry/202311/16689
Feb 2, 2024
Merged

[action] [PR:16689] Fix system-health hardware_checker to consume fan tolerance details (#16689)#17983
mssonicbld merged 1 commit intosonic-net:202311from
mssonicbld:cherry/202311/16689

Conversation

@mssonicbld
Copy link
Collaborator

Why I did it

Fan tolerance checking is done through new APIs, is_under_speed and is_over_speed, which populate corresponding fields into the database. speed_tolerance is no longer used and was removed, but system-health was not updated and indicates failures:

ADO: 25279165

root@sonic/# show system-health summary
System status summary

System status LED red_blink
Services:
Status: OK
Hardware:
Status: Not OK
Reasons: Failed to get speed tolerance for fantray5.fan1
Failed to get speed tolerance for fantray5.fan0
Failed to get speed tolerance for fantray4.fan1
Failed to get speed tolerance for fantray4.fan0
Failed to get speed tolerance for fantray3.fan1
Failed to get speed tolerance for fantray3.fan0
Failed to get speed tolerance for fantray2.fan1
Failed to get speed tolerance for fantray2.fan0
Failed to get speed tolerance for fantray1.fan1
Failed to get speed tolerance for fantray1.fan0
Failed to get speed tolerance for fantray0.fan1
Failed to get speed tolerance for fantray0.fan0
Failed to get speed tolerance for PSU1.fan0
Failed to get speed tolerance for PSU0.fan0

How I did it
Updated hardware_checker.py in system-health to consume new is_under_speed and is_over_speed database entries instead of speed_tolerance and hard-coded calculations.

How to verify it
root@sonic:/# show system-health summary
System status summary

System status LED green
Services:
Status: OK
Hardware:
Status: OK

…onic-net#16689)

Why I did it

Fan tolerance checking is done through new APIs, is_under_speed and is_over_speed, which populate corresponding fields into the database. speed_tolerance is no longer used and was removed, but system-health was not updated and indicates failures:

ADO: 25279165

root@sonic/# show system-health summary
System status summary

  System status LED  red_blink
  Services:
    Status: OK
  Hardware:
    Status: Not OK
    Reasons: Failed to get speed tolerance for fantray5.fan1
	     Failed to get speed tolerance for fantray5.fan0
	     Failed to get speed tolerance for fantray4.fan1
	     Failed to get speed tolerance for fantray4.fan0
	     Failed to get speed tolerance for fantray3.fan1
	     Failed to get speed tolerance for fantray3.fan0
	     Failed to get speed tolerance for fantray2.fan1
	     Failed to get speed tolerance for fantray2.fan0
	     Failed to get speed tolerance for fantray1.fan1
	     Failed to get speed tolerance for fantray1.fan0
	     Failed to get speed tolerance for fantray0.fan1
	     Failed to get speed tolerance for fantray0.fan0
	     Failed to get speed tolerance for PSU1.fan0
	     Failed to get speed tolerance for PSU0.fan0

How I did it
Updated hardware_checker.py in system-health to consume new is_under_speed and is_over_speed database entries instead of speed_tolerance and hard-coded calculations.

How to verify it
root@sonic:/# show system-health summary
System status summary

  System status LED  green
  Services:
    Status: OK
  Hardware:
    Status: OK
@mssonicbld
Copy link
Collaborator Author

Original PR: #16689

@mssonicbld mssonicbld merged commit 3b982c0 into sonic-net:202311 Feb 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants